Overview

Monitoring the distribution and change in land cover types can help us understand the impacts of phenomena like climate change, natural disasters, deforestation, and urbanization. Determining land cover types over large areas is a major application of remote sensing because we are able to distinguish different materials based on their spectral reflectance.

Classifying remotely sensed imagery into landcover classes enables us to understand the distribution and change in landcover types over large areas. There are many approaches for performing landcover classification – supervised approaches use training data labeled by the user, whereas unsupervised approaches use algorithms to create groups which are identified by the user afterward.

We will be using a form of supervised classification, a decision tree classifier. Decision trees classify pixels using a series of conditions based on values in spectral bands. These conditions (or decisions) are developed based on training data. In this lab we will create a land cover classification for southern Santa Barbara County based on multi-spectral imagery and data on the location of 4 land cover types:

  • green vegetation
  • dry grass or soil
  • concrete
  • water

credit: this exercise is based on a materials developed by Chris Kibler.

Highlights

  • load and process Landsat scene
  • crop and mask Landsat data to study area
  • extract spectral data at training sites
  • train and apply decision tree classifier
  • plot results

Data

Landsat 5 Thematic Mapper

  • Landsat 5
  • 1 scene from September 25, 2007
  • bands: 1, 2, 3, 4, 5, 7
  • Collection 2 surface reflectance product

Study area and training data

  • polygon representing southern Santa Barbara county
  • polygons representing training sites
    • type: character string with land cover type

Exploration

Process data

Load packages and set working directory

We’ll be working with vector and raster data, so will need both sf and terra. To train our classification algorithm and plot the results, we’ll use the rpart and rpart.plot packages.

library(sf)
library(terra)
library(here)
library(dplyr)
library(rpart)
library(rpart.plot)
library(tmap)

Load Landsat data

Create a raster stack using the rast function. We’ll then update the names of the layers to match the spectral bands and plot a true color image.

# list files for each band, including the full file path
filelist <- list.files("data/landsat-data/", full.names = TRUE)
# read in and store as a raster stack
landsat <- rast(filelist)

landsat
## class       : SpatRaster 
## dimensions  : 7251, 8111, 6  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 115485, 358815, 3724485, 3942015  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 11N (EPSG:32611) 
## sources     : LT05_L2SP_042036_20070925_20200829_02_T1_SR_B1.TIF  
##               LT05_L2SP_042036_20070925_20200829_02_T1_SR_B2.TIF  
##               LT05_L2SP_042036_20070925_20200829_02_T1_SR_B3.TIF  
##               ... and 3 more source(s)
## names       : LT05_~SR_B1, LT05_~SR_B2, LT05_~SR_B3, LT05_~SR_B4, LT05_~SR_B5, LT05_~SR_B7
# update layer names to match band
names(landsat) <- c("blue", "green", "red", "NIR", "SWIR1", "SWIR2")

landsat
## class       : SpatRaster 
## dimensions  : 7251, 8111, 6  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 115485, 358815, 3724485, 3942015  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 11N (EPSG:32611) 
## sources     : LT05_L2SP_042036_20070925_20200829_02_T1_SR_B1.TIF  
##               LT05_L2SP_042036_20070925_20200829_02_T1_SR_B2.TIF  
##               LT05_L2SP_042036_20070925_20200829_02_T1_SR_B3.TIF  
##               ... and 3 more source(s)
## names       : blue, green, red, NIR, SWIR1, SWIR2
# plot true color image
plotRGB(landsat, r = 3, g = 2, blue = 1, stretch = "lin")
## Warning in plot.window(...): "blue" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "blue" is not a graphical parameter
## Warning in title(...): "blue" is not a graphical parameter

Load study area

Utilize a .shp file that defines the area we are exploring in order to focus our analysis on the southern portion of Santa Barbara County that we have training data for.

# read in shapefile for southern portion of SB county
SB_county_south <- st_read("data/SB_county_south.shp")
## Reading layer `SB_county_south' from data source 
##   `/Users/melissawidas/Documents/Github/santa_barbarara_land_classification/data/SB_county_south.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 18 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -120.2327 ymin: 34.33603 xmax: -119.5757 ymax: 34.53716
## Geodetic CRS:  NAD83
# project to match the Landsat data
SB_county_south <- st_transform(SB_county_south, crs = st_crs(landsat))

plot(SB_county_south)
## Warning: plotting the first 10 out of 18 attributes; use max.plot = 18 to plot
## all

Crop and mask Landsat data to study area

Crop and mask the Landsat data to our study area in order to save computational time.Additionally, remove any objects we’re no longer working with from our environment to save space.

# crop Landsat scene to the extent of the SB county shapefile
landsat_crop <- crop(landsat, SB_county_south)

# mask the raster to southern portion of SB county
landsat_masked <- mask(landsat_crop, SB_county_south)

# remove unnecessary object from environment
rm(landsat, landsat_crop, SB_county_south)

# plot true color image
plotRGB(landsat_masked, r = 3, g = 2, blue = 1, stretch = "lin")
## Warning in plot.window(...): "blue" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "blue" is not a graphical parameter
## Warning in title(...): "blue" is not a graphical parameter

Convert Landsat values to reflectance

Now we need to convert the values in our raster stack to correspond to reflectance values. To do so, we need to remove erroneous values and apply any scaling factors to convert to reflectance.

In this case, we are working with Landsat Collection 2. The valid range of pixel values for this collection 7,273-43,636, with a multiplicative scale factor of 0.0000275 and an additive scale factor of -0.2. So we reclassify any erroneous values as NA and update the values for each pixel based on the scaling factors. Now the pixel values should range from 0-100%.

summary(landsat_masked)
## Warning: [summary] used a sample
##       blue           green            red             NIR       
##  Min.   : 7675   Min.   : 7543   Min.   : 7274   Min.   : 7248  
##  1st Qu.: 8178   1st Qu.: 8063   1st Qu.: 7667   1st Qu.: 7545  
##  Median : 8387   Median : 8941   Median : 8890   Median :12507  
##  Mean   : 8668   Mean   : 9097   Mean   : 9062   Mean   :11460  
##  3rd Qu.: 8955   3rd Qu.: 9730   3rd Qu.: 9965   3rd Qu.:14306  
##  Max.   :65535   Max.   :26662   Max.   :27885   Max.   :28029  
##  NA's   :39855   NA's   :39855   NA's   :39855   NA's   :39855  
##      SWIR1           SWIR2      
##  Min.   : 7025   Min.   : 6936  
##  1st Qu.: 7419   1st Qu.: 7356  
##  Median :11940   Median : 9824  
##  Mean   :11372   Mean   :10007  
##  3rd Qu.:13962   3rd Qu.:11740  
##  Max.   :25139   Max.   :24752  
##  NA's   :39855   NA's   :39855
# reclassify erroneous values as NA
rcl <- matrix(c(-Inf, 7273, NA,
         43636, Inf, NA),
         ncol = 3, byrow = TRUE)

landsat <- classify(landsat_masked, rcl = rcl)

# adjust values based on scaling factor
landsat <- (landsat * 0.0000275 - 0.2)*100

summary(landsat)
## Warning: [summary] used a sample
##       blue           green            red             NIR       
##  Min.   : 1.11   Min.   : 0.74   Min.   : 0.00   Min.   : 0.23  
##  1st Qu.: 2.49   1st Qu.: 2.17   1st Qu.: 1.08   1st Qu.: 0.75  
##  Median : 3.06   Median : 4.59   Median : 4.45   Median :14.39  
##  Mean   : 3.83   Mean   : 5.02   Mean   : 4.92   Mean   :11.52  
##  3rd Qu.: 4.63   3rd Qu.: 6.76   3rd Qu.: 7.40   3rd Qu.:19.34  
##  Max.   :39.42   Max.   :53.32   Max.   :56.68   Max.   :57.08  
##  NA's   :39856   NA's   :39855   NA's   :39855   NA's   :39856  
##      SWIR1           SWIR2      
##  Min.   : 0.10   Min.   : 0.20  
##  1st Qu.: 0.41   1st Qu.: 0.60  
##  Median :13.43   Median : 8.15  
##  Mean   :11.88   Mean   : 8.52  
##  3rd Qu.:18.70   3rd Qu.:13.07  
##  Max.   :49.13   Max.   :48.07  
##  NA's   :42892   NA's   :46809
# plot true color image to check results
plotRGB(landsat, r = 3, g = 2, blue = 1)
## Warning in plot.window(...): "blue" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "blue" is not a graphical parameter
## Warning in title(...): "blue" is not a graphical parameter

# check values are 0 - 100
max(landsat)
## class       : SpatRaster 
## dimensions  : 773, 2007, 1  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 203175, 263385, 3802755, 3825945  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 11N (EPSG:32611) 
## source(s)   : memory
## name        :     max 
## min value   :  1.3125 
## max value   : 74.4625

Classify image

Extract reflectance values for training data

Load the .shp identifying different locations within our study area as containing one of our 4 land cover types. Then extract the spectral values at each site to create a data frame that relates land cover types to their spectral reflectance.

# read in and transform training data
training_data <- st_read("data/trainingdata.shp") %>% 
  st_transform(., crs = st_crs(landsat))
## Reading layer `trainingdata' from data source 
##   `/Users/melissawidas/Documents/Github/santa_barbarara_land_classification/data/trainingdata.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 40 features and 2 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 215539.2 ymin: 3808948 xmax: 259927.3 ymax: 3823134
## Projected CRS: WGS 84 / UTM zone 11N
# extract reflectance values at training sites
training_data_values <- terra::extract(landsat, training_data, df = TRUE)

# convert training data to data frame
training_data_attributes <- training_data %>% 
  st_drop_geometry()

# join training data attributes and extracted reflectance values
SB_training_data <- left_join(training_data_values, training_data_attributes,
          by = c("ID" = "id")) %>% 
  mutate(type = as.factor(type))

Train decision tree classifier

To train our decision tree, we will establish our model formula (i.e. what our response and predictor variables are). The rpart function implements the CART algorithm. The rpart function needs to know the model formula and training data you would like to use. Because we are performing a classification, we set method = "class". We also set na.action = na.omit to remove any pixels with NAs from the analysis.

To understand how our decision tree will classify pixels, we can plot the results. The decision tree is comprised of a hierarchy of binary decisions. Each decision rule has 2 outcomes based on a conditional statement pertaining to values in each spectral band.

# establish model formula
SB_formula <- type ~ red + green + blue + NIR + SWIR1 + SWIR2
# train decision tree
SB_decision_tree <- rpart(formula = SB_formula,
      data = SB_training_data,
      method = "class",
      na.action = na.omit)

# plot decision tree
prp(SB_decision_tree)

Apply decision tree

Now that we have created our decision tree, we can apply it to our entire image. The terra package includes a predict() function that allows us to apply a model to our data. In order for this to work properly, the names of the layers need to match the column names of the predictors we used to train our decision tree. The predict() function will return a raster layer with integer values. These integer values correspond to the factor levels in the training data. To figure out what category each integer corresponds to, we can inspect the levels of our training data.

# classify image based on decision tree
SB_classification <- predict(landsat, SB_decision_tree,
        type = "class",
        na.rm = TRUE)

SB_classification
## class       : SpatRaster 
## dimensions  : 773, 2007, 1  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 203175, 263385, 3802755, 3825945  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 11N (EPSG:32611) 
## source(s)   : memory
## categories  : class 
## name        :            class 
## min value   : green_vegetation 
## max value   :            water
# inspect level to understand the order of classes in prediction
levels(SB_classification)
## [[1]]
##   value            class
## 1     1 green_vegetation
## 2     2  soil_dead_grass
## 3     3            urban
## 4     4            water

Plot results

Plot the results.

# plot results
tm_shape(SB_classification) +
  tm_raster()
## SpatRaster object downsampled to 621 by 1612 cells.